A Kronecker-factored approximate Fisher matrix for convolution layers

نویسندگان

  • Roger B. Grosse
  • James Martens
چکیده

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ILU and IUL factorizations obtained from forward and backward factored approximate inverse algorithms

In this paper‎, ‎an efficient dropping criterion has been used to compute the IUL factorization obtained from Backward Factored APproximate INVerse (BFAPINV) and ILU factorization obtained from Forward Factored APproximate INVerse (FFAPINV) algorithms‎. ‎We use different drop tolerance parameters to compute the preconditioners‎. ‎To study the effect of such a dropping on the quality of the ILU ...

متن کامل

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network’s Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large block...

متن کامل

Kronecker-factored Curvature Approxima- Tions for Recurrent Neural Networks

Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization tasks (Ba et al., 2017). It is based on an approximation to the Fisher information matrix (FIM) that makes assumptions about the particular structure of the network and the way it is parame...

متن کامل

Kronecker operational matrices for fractional calculus and some applications

The problems of systems identification, analysis and optimal control have been recently studied using orthogonal functions. The specific orthogonal functions used up to now are the Walsh, the block-pulse, the Laguerre, the Legendre, Haar and many other functions. In the present paper, several operational matrices for integration and differentiation are studied. we introduce the Kronecker convol...

متن کامل

Complete pivoting strategy for the $IUL$ preconditioner obtained from Backward Factored APproximate INVerse process

‎In this paper‎, ‎we use a complete pivoting strategy to compute the IUL preconditioner obtained as the by-product of the Backward Factored APproximate INVerse process‎. ‎This pivoting is based on the complete pivoting strategy of the Backward IJK version of Gaussian Elimination process‎. ‎There is a parameter $alpha$ to control the complete pivoting process‎. ‎We have studied the effect of dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016